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ABSTRACT 

The vast majority of Automatic Teller Machines (ATMs) researches is mostly focused on security and ATM 
modeling, but no study has considered extracting financial information from the electronic journal (EJs). ATM Customer 
transactions are recorded in a semi- structured text file called EJ. This makes it difficult to run a direct search query on such 
format to resolve transaction disputes in banks. This research focuses on how to extract financial information from ATM 
EJs. An EJ Parser algorithm was developed to establish information extraction (IE) method. The IE applied a divide and 
conquer concept to decompose the EJ into sub -problems of unit transaction sessions, and named entity recognition (NER) 
was performed to identify all financial transaction tokens or entities, and the extraction task adopted a regular expression 
(Regex) as an entity classifier. The algorithm was tested with a collection of live EJ data from a Wincor ATM of a bank, 
and its performance was evaluated accordingly, using standard performance metrics such as precision, accuracy, f-secure, 
misclassification and recall. The algorithm indicated 99%, 99.7%, 99.7% of precision, recall and accuracy respectively. 
However, there were a few exceptions that happened as misclassification of which, were traced to ‘comments’ and ‘avail 
balance’ entities. 

KEYWORDS: Automatic Teller Machine (ATM), ATM State chart, EJ Parser, ATM Electronic Journal, EJs, EJ, 

NDC, CEN XFS, NCR 

INTRODUCTION 

Khalifa and Saadan (2013) defined automatic teller machine (ATM) as a computerized telecommunications 
device and real-time system that provides the clients of a financial institution with access to their bank accounts in a public 
space, without the intervention of the administration of the financial institution. These machines are found at most 
supermarkets; convenience stores and travel centers (Bowen, 2000). There are various brands of ATMs such as NCR, 
Wincor, Diebold, King Teller, and Hyosung deployed to Nigerian banking industry. Generally, ATM runs on an operating 
system (OS) such as Windows and Linux; and device drivers called CEN XFS and ATM client applications (e.g. Process 
or NCR direct connect). However, all ATMs deployed in Nigeria run on Microsoft Windows OS, XP or Windows 7. 
Currently, almost all the ATMs in Nigeria have been migrated to Windows 7. 

Concepts, Wang, Zhang, Sheu, Li, and Guo (2010) described ATM as a 5-tuple finite state machine (FSM), which 
assumes a set of states and a set of state transition functions. Wang et al modeled ATM using a transition diagram and a 
transition table. The ATM system comprises subunits like Card Reader, Keypad, Monitor, Bill disburser (a unit that 
dispenses money), Bill storage (that stores money), and System clock. All these subunits are connected to ATM processor 
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(which can be a personal computer, PC); and Wang, et al (2010) illustrated a model to conceptualize an ATM system as 
shown in Figure 1 . 

ATM operation depends on events and states. An event could be a stream or a combination of multiple signals or 
just a single signal which can be either human or system 



Figure 1: The Conceptual Model of the ATM System 

(Source: Wang, 2010) 

Invoked; hence it is an atomic occurrence and has theoretically zero duration Gomaa (2011). Examples of events 
are card inserted, pin entered, shutter opened etc. Card inserted, for instance, always precedes PIN entered into the state- 
flow. The state chart in Figure 2 illustrates how events and states interrelated during ATM operation. The state chart was 
adapted (Gomaa, 2011). ATM requires a set of input signals (events) before the transition can occur, depending on its 
current state. The sequence of these states is defined in the state flow received from the ATM host server as part of “ATM 
download”. At every transition, there is usually an entry in the EJ detailing the transaction flow and the interoperability 
between the ATM and the user. This is what brings about the ATM electronic journal. This EJ contains both ATM message 
and financial transactions performed on ATM. The area of concern is the customer or financial transactions, which are 
being recorded in a semi -structured text format on ATM as an electronic journal or EJ, as usually called in the industry. EJ 
samples adapted from Hyosung and Wincor ATM are shown in Figure 3 a and b, respectively. 



Figure 2: ATM State Chart (Adapted: Gomaa, 2011) 
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Ol TRANSACTION START 
04 CAM ERA - PI CTU R E TA KE N 
04 CAM ERA - PI CTU R E TA KE N 

04 REPLY RECV 

05 TRANSACTION DATA [SET NEXT STATE PRINT) 


Z9/1Z/ZO 14- OS: 

29/12/20 14 OS: 

29/12/20 14 OS: 

29/12/ZO 14 OS: 

29/12/ZO 1406: 

[TRANSACTION RECORD] 
opcode [ACID J 

Fu rurti on ID [S ] 

Card Number [S 199HJCXXX XX91ZO] 

Amo unt [O OOOOOOOOOOO] 

TransSEQ number [3 944] 

Error Code [OOOOOOO] 

JPR CO NTE NTS 

29\12\14 06:42 1232S3 22 


> 


ATM Message 


>- Host Message 


29/12/20 14 06 
29/12/ZO 14 OS 
29/12/20 14 06 
29/12/20 14 06 
29/12/20 14 06 
29/12/20 14 06 


42: 20 PI N E N TER E D 

42: 23 CAM ERA - PI CTU R E TA KE N 

4Z: 32 REPLY RECV 

42:42 CASH DISPENS - P RESENTED 
4Z: 44 CAS H Dl SP ENS -ITEM TA KE N 
4Z: 44 TRANSACTION DATA [CO M PL ETE D) 




[TRANSACTION RECORD] 

[ACDDAB ' 

[Z] 

[6 19911XXXXXX912D] 
O OOOOOIOOOOO] 
a,a,B, B] 

0 / 2 , 0 , 0 ] 

[o ,0,2,0] 

O ,0, S6Z.SS4] 

p,o, 2,0] 
p,o, 0,0] 

TiansSEQ number [3 945] 

Error Code [OOOOOOO] 

Present Amount [O OOOOOIOOOOO] 

Present Time [2 9/1Z/Z014 06: 4Z:4Z] 

Taken A mo u rut [O OOOOOIOOOOO] 

Taken Time [2 9/12/2014 06:42:44] 

JPR CO NTE NTS 


OPCode 
Function ID 
Card Number 
Amo unt 
Denomi nation 
Request Co unt 
Dispense Count 
Remain Count 
Pick- up Count 
Reject Count 


> 


ATM Message 




Z9\1Z\14 


06: 4Z 1Z3Z5 3ZZ 


CARD N UM BER 9 1ZO 

3946 003945 

W I THDRAW Nt3 NIOOO. OO 

FROM 693-7 Host Message 

L EDG E R Nt3 NZ69 3 3.65 

AVAIL NO NZ693 3 65 


Figure 3 (A) EJ Sample (Adapted From Hyosung ATM) 


— Host Message 


05:59:46 -> TRANSACTION START 
05:59:47 TRACK 2 DATA: 506123****** ***1040 
05:59:47 TRANSACTION REQUEST ACID 
05:59:48 TRANSACTION REPLY NEXT 018 FUNCTION 5000 
04\12\15 06:00 10442211 
506123 1040 

0130 ??? CARDEXPIRYALERT 

??? 

05:59:57 PIN ENTERED 
06:00:11 AMOUNT 1700000 ENTERED 
06:00:11 TRANSACTION REQUEST ACAAAB 

06:00:14 TRANSACTION REPLY NEXT 100 FUNCTION 2086 
06:00:14 TVR: 8000040000, TSI: 6000 
06:00:21 CASH REQUEST: 17000000 
06:00:21 CASH 1:1,17; 

06:00:25 CASH P RESENTED 

04\12\15 06:01 10442211 

506123 1040 

0131 000516444576 0041068415 
WITHDRAW NGN17000.00 

FROM ....8415 

LEDGER NGN 1368.06A VAIL NGN1368.06 


— ATM Message 


06:00:27 CASH TAKEN 

06:00:32 CARD(506123*********1040) TAKEN 
06:00:36 <- TRANSACTION END 


Figure 3 (B) EJ Sample (Adapted From Wincor ATM) 


RELATED WORK 


There is no relatively specific research work at the time of writing this paper on the ATM electronic journal. 
Nonetheless, there are few pieces of research on extracting information from both semi -structured and unstructured 
electronic documents. Information extraction process from a text file is similar to the text mining process. Tan (1999) 
emphasized that text is the most natural form of storing information, and mining it has a higher commercial potential than 
data mining. Tan stated that 80% information of an organization is in text documents. However, text mining is much more 
intricate than data mining; this is because text is naturally unstructured. However, Tan created a framework consisting two 
phases as shown in Figure 4. These are: 
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• Text refining that transforms text documents into an intermediate form, IF. 

• Knowledge distillation that deduces patterns or knowledge from the IF. 



Document-based 

intermediate 

form 


Concept-based 

intermediate 

form 


Clustering 

Categorization 

Visualization 


Predictive Modeling 
Associative Discovery 
Visualization 


Text Knowledge 

refining distillation 


Figure 4 Tan Text Mining Frameworks 

(Source: Tan, 1999) 

Tan further views text mining as a collection of information retrieval, text analysis, information extraction, 
clustering, categorization, visualization, database technology, machine learning, and data mining. 

Garofalakis, Rastogi, and Shim (1999) postulated an algorithm called Sequential Pattern mining with Regular 
expression constraints (SPIRIT). Conventional mining systems provide users with only a very restricted mechanism (based 
on minimum support) for specifying patterns of interest. Garofalakis et al (1999), proposed the use of regular expressions 
(REs) as a flexible constraint specification tool that enables user-controlled focus to be incorporated into the pattern mining 
process. The main peculiar factor among the proposed schemes is the degree to which the RE constraints are enforced to 
trim the search space of patterns during computation. 

Kawtrakul and Yingsaeree (2005) proposed a unified framework to extract metadata automatically from various 
forms of electronic documents such as such, as pdf, doc, and image, excel, and text files using regular expressions. The use 
of a regular expression to extract information has been a dominant practical IE method for several years (Li, 
Krishnamurthy, Raghavan, Vaithyanathan, and Jagadish, 2008), but creating a regular expression for complex information 
extraction tasks is time-consuming and tedious. Kawtrakul^ al designed a system comprises an optical character 
recognition (OCR) that extracts the content and converts it to a standard text format; and discovered knowledge is 
analyzed. 

SYSTEM DESIGN 

This paper focuses on how to extract customer transactions from ATMs. The research methods involve the use of 
information extraction (IE) method to extract specific pieces of data from EJ document. As a part of the optimal means to 
devise an IE method, context-free grammar (CFG) was implemented to analyze the production rule of ATM host message 
derivation in the EJ. Regular expression (Regex) was used to extract named entities such as card number or Primary 
Account Number (PAN), transaction type, transaction serial number (TSN), amount, comment, transaction date and time. It 
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was easier and faster when ‘divide and conquer’ concept was applied to break EJ content into transaction sessions. The 
extraction was recursively performed across multiple transaction sessions. All these procedures were implemented in the 
algorithm design. 

Algorithm Design 

The algorithm design was segmented into two main parts; the EJ decomposition and information extraction. 

EJ Decomposition 

The EJ decomposition needs the concept of divide and conquer paradigm to break down an EJ file into transaction 
sessions. Each session has one or more transaction events such as Withdraw, Inquiry, Transfer, Virtual tops up, Bill 
payment and so on as defined by a bank. This is because, a customer might decide to perform multiple transactions within 
a cycle or session (from card insert till card eject). The transaction cycle is decomposed into events and all the information 
pertaining to customer transaction is captured or extracted into a dataset (or session table) using NER. 

The dataset is an accumulation of customer transaction events. Typically, EJ File comprises multiple transaction 
sessions. However, each transaction is extracted based on regular expression defined in the CFG. The extraction procedure 
follows the algorithm developed (EJ Parser algorithm). The expensive part of the algorithm is the conversion module, i.e. 
structuring the extracted entities into a relational dataset. The extracted elements are stored in the database. 

Information Extraction 

Information regarding customer transactions is embedded in both ATM messages and host messages, which 
follow a language rule. There exist both regular and non-regular patterns within the journal. A non-regular language must 
thus include an infinite number of words. If a language includes an infinite number of words, there is no bound on the size 
of the words in the language. In the language rule, regular expressions help to generate or describe all strings in the 
language while finite automata recognize a specific string in the language. This helps to create, host message template. 
Most banks in Nigeria design the host message template themselves obeying a rule based on central bank regulations. The 
same principle was adopted to develop the production rule needed in the reflex design. Some notations for non-terminal in 
the production rules to be considered are expressed in Figure: 5. 
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Nonterminal 

Meaning 

Sample 

ATMJOIR 

ATM journal 

NIL 

TRANSACTION 

ATM Transaction within 
a session 

05:59:46 -> TRANSACTION START 

05:59:47 TRACK 2 DATA: 506123***+***+*1040 

05:59:57 PIN ENTERED 

06:00: 1 1 AMOUNT 1 700000 ENTERED 

06:00: 1 1 TRANSACTION REQUEST ACAAAB 

06:00:14 TRANSACTION REPLY NEXT 100 FUNCTION 
2086 

06:00:14 TVR: 8000040000. TSI: 6000 

06:00:21 CASH REQUEST: 17000000 

06:00:21 CASH 1:1.17; 

06:00:25 CASH PRESENTED 

04\12\15 06:01 10442211 

506123 1040 

0131 000516444576 0041068415 

WITHDRAW NGN1 7000.00 

FROM. ...8415 

LEDGER NGN1368.06AVAIL NGN1368.06 



06:00:27 CASH TAKEN 

06:00:32 CARD (506123*********1040) TAKEN 

06:00:36 <- TRANSACTION END 

ATM MSG 

ATM Message 

05:59:46 -> TRANSACTION START 

05:59:47 TRACK 2 DATA: 506123*** + ****<T040 

05:59:47 TRANSACTION REQUEST ACID 

05:59:48 TRANSACTION REPLY NEXT 018 FUNCTION 
5000 

HOST MSG 

Host message 

04\12\15 06:01 10442211 

506123 1040 

0131 000516444576 0041068415 

WITHDRAW NGN1 7000.00 

FROM. ..8415 

LEDGER NGN1368.06AVAIL NGN1368.06 



06:00:27 CASH TAKEN 

PAN_TSN 

PAN (card no) with TSN 
(transaction serial no) 

PAN=506123 1040 and TSN= 0131 

CU STTRANS 

Customer transaction 
detail 

WITHDRAW NGN1 7000.00 

FROM. .8415 

LEDGER NGN1 368.06 

AVAIL NGN1 368.06 

EVENT 

Transaction event 

WITHDRAW’. INQUIRY. INTERBANK TRANSFER. 
MINISTATEMENT. THIRD PARTY PAYMENT, etc. 


Figure 5: Non-terminal notations for designing ATM journal CFG. 


The Developed Production Rule is given As Follows: 


ATM JOUR -> TRANSACTION 

TRANSACTION -> ATM MSG HOST MSG ATM MSG 

ATM MSG -> INFO | ERRMSG 

INFO -> pre transaction info+ | post transaction info+ 

ERRMSG -> errmsg+ 

HOSTMSG -> G | HOSTMSG 

HOST MSG ->date<SP>time<SP>terminal<LF>PAN TSN<LF>CUST TRANS<LF>HOST COMMENT 
PAN TSN -> PAN<SP>TSN<SP>UNICODE | [PAN |TSN<SP>UNICODE] 

PAN -> digit-*- char* digit+ 

TSN -> digit [4] 

UNICODE -> g | digit [6] 

CUST TRANS -> E VENT<SP> AM OUNT <LF> AC CT <LF>L ED GE R<LF> A V AIL<LF> 

EVENT -> e | ‘Withdraw’ | ’Inquiry’ | ‘Transfer’ | ‘Virtual top’ | ‘Third party payment’ | 

‘Cash deposit’ | ‘Mini statement’ | ‘Advance Prepaid’ 

AMOUNT -> G | CURR MONEY 
CURR -> ‘ngn’ | ’usd’ | ’cfa’ | ’gbp’ 

MONEY -> digit+ ? digit+.digit [2] | digit+.digit [2] 

ACCT - > ‘From’ []*<SP>ACCTNO 

ACCTNO -> digit* | word 

LEDGER -> G | Ledger’<SP>AMOUNT 

AVAIL -> g | ‘ Avail’<SP>AMOUNT 

HOST COMMENT -> g | transaction_comment 

<SP> -> \t 

<LF> -> \r | \r\n | \n 

digit = [0....9] 

word = \w+ 

errmsg* = \w+ 

Figure: 6 
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The Following Named Entities and Regex Chunks Were Considered Based on the Production Rule 


• Transaction Date = (? <Date>\d { 2 } [/l\\]\d { 2 } [/l\\]\d { 2 } ) 

• Transaction Time = (? <Time>\d{2}:\d{2}) 

• ATM used = (? <Terminal>\w+) 

• PAN = (? <PAN>\d*[.l\*]*\d+) 

• Transaction Serial No = (? <TSN>\d{4}) 

• Withdraw = (? <Transaction> (WITHDRAW)? [ ]+ (?<CurrencyCode>\w{3})? (? <Amount>\d* [.] \d+)? 

• Inquiry = (? <Transaction>(INQUIRY) ] *) 

• Transfer = (? <Transaction>(INTERBANK TRANSFER)) ] * (? <CurrencyCode>\w{3})? (? <Amount>\d* [.] 
\d+)?) \r\n (FROM (? <From Account>([.] I\w)+) TO (? <To Account»([.] I\w)+)) 

• Mini statement = (? <Transaction>(MINISTATEMENT)) 

• Bill payment = (? <Transaction>(THIRD PARTY PAYMENT) ] * (? <CurrencyCode>\w{3})? (? <Amount>\d* 
[.] \d+)?) \r\n 

• However, placeholders such as (? <Date>), (? <Time>), (? <Transaction>) etc. Are variables holding the matches 
from the input string. 


THE DEVELOPED ALGORITHM 

EJParser Algorithm 


• Input: Read EJ File as Filestream; 

• Declare ej stream as string : 

Let ej stream = FileStream.Read(EJ) as string: 

• Apply divide and conquer paradigms 

Stepl: Split ejstream into array of transaction session using transaction delimiters 
Let sessionjdelimeters — { transaction start, transaction end } ; 

Let ejsessionArray — Split (ejstream, session delimeters); 

/* Create object array*/ 

Let data = object of data array ; 

/*Initialize index*/ 

Let i = 0; 

Repeat 

Step 2: Let ejsession = ejsessionArrayji] ; 

- Match all transaction event(s) in ejsession with defined production rule 
using regex; 

- Let matchArray = Match (ejsession, regex); 


Step 3: Tokenize transaction event in matchArray into fields based on regex 
placeholders; and add them to data object, 
if (matchArray != null) 

Tokenize (matchArray, (pan, tsn, date, time, terminal, transtype, currency, 
amount, account, ledger, avail, comment, error, status}); 

/*save extracted entities in data object*/ 

Add (data, {pan, tsn, date, time, terminal, transtype, currency, amount, 
account, ledger, avail, comment, error, status}); 


else goto Step 4: 


Step 


Until 


4: increment the index by 1 
Let i = i +1; 


i = ejsessionArrayLength - 1; 


Figure: 7 
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The algorithm decomposes EJ file into transaction entities, which are broken down mathematically in equations 1 

and 2. 

f(t,j)= Z" =1 Matc/i ( s t) (i) 

extract ^, /) = Y" =1 f(t i ,j') + 0 (2) 

Equation 1 says the sum of matched sessions found in a transaction cycle, it is a function of financial transactions 
t in EJ, j; while equation (2) is an aggregate of equation 1, which are entities found in the transactions; and n is the total 
number of transactions in the EJ. 

Where Si, S 2 ,.-» S t=n are transaction sessions of the ej stream, French S t , obtained by a divide and conquer method, 
it is transaction containing information, j is the journal and & is the error handler. EJParser algorithm embedded with 
regular expressions involves the use of top-down parsing technique to extract transaction details of ATM customers. For 
every EJ file per daily transactions, it holds that the time to decompose the entire EJ file according to equation 3 is: 

T(n) = T(Decom(ejstream)) = T(Match(ejsession)) + T(Tok(event)) + T (Com (tokens)) (3) 

Where, Decom = decompose, Tok = Tokenize, Com = Combine 

EJParser algorithm will extract information within O (n) approx. Running time. 

RESULT ANALYSIS 


The algorithm was implemented as a software application using Microsoft.NET technology and was tested with a 
live EJ from Wincor Nixdorf ATM. Figure 7 shows the user interface of the application with the named entities extracted. 

An EJ file that contains 724 transactions performed on the 4th of December, 2015 was collected from a Nigerian 
bank. It took the application 10s to extract the named classes from 724 transactions. All the actual transaction events 
performed that day at the ATM are described in Figure 9, and the extracted are shown in Figure 10. 




Journal Filename F:\Access Bank\Wincor\20151204.jm [ ... | [ Pane | | Save ] | Stmiate TempPath | [ Simulate Zip Path ] | 724 Matches; Parsing time: 00:00:10.4204844 



Figure 8: Application User Interface With Data Extracted. 
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Transaction Events 

Frequency in the E J 

ADVANCE REPAID 

10 

CHANGE PIN 

4 

INQUIRY 

103 

INTE RRANK TRANS FE R 

2 

UNKNOWN* 

311 

WITHDRAW 

294 

Total: 

724 


Figure 9 Actual Transaction Events 
Performed on the Wincor ATM 


Note: There are some are tagged unknown because they were not consummated 


Cbssi'Enliti- 

Actu al 
Count 

Correcth- 
Estr acted 

PAM 

724 

724 

TSM 

724 

724 

Terminal 

724 

724 

Date 

724 

724 

Time 

724 

724 

Transaction Type 

724 

724 

Amount 

306 

306 

Avail Balance 

326 

314 

Opcode 

724 

724 

Comment 

74 

74 


5,774 

5,762 


Figure 10: Wincor Data 
Extraction 


Performance Evaluation 

There are statistical measures of performance that were considered to evaluate the regular expression, and these 
are accurate, true positive rate (recall or sensitivity), misclassification rate, precision, and F-measures. 

Some symbols are defined as follows in order to establish some equations: 

O - Entities to be identified 

R x - Input regular expression, regex 

Ej- Electronic Journal document 

Supposing, M(R x ,Ej ) represents the set of matches obtained by evaluating regex R x over an electronic journal 
(EJ) collection Ej the outputs are defined over 4 possible outcomes; and these are: 

M t+ (R x , Ej ) = { x e M(R x , Ej ) : x instance of O ) —The M T+ is the true positive match for R x . 

M T -(R x , Ej) = { x e M(R xt Ej) : x instance of O )- The M T _ is the true negative match for R x . 

M f+ (R x , Ej) = { x e M(R x , Ej) : x instance of O } - The M F+ is the false positive match for R x . 

M f _{R x , Ej) = { x e M(R x , Ej) : x instance of O ) —The M F _ is the false negative match for R x . 

The regex R x is designed to identify instances of O. The following metrics were used to evaluate the regex and 
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validate the extraction quality in the search space. 

To calculate the accuracy A; 


A (R x ,Ej) — 


M T +( E x> E j) + M T -(R x ,Ej) 

actual total entities 


To calculate the misclassification rate M; 


(4) 


M f R _ M F+( R X’ E j) + M F-( R X> E j) 

^ x> J ' actual total entities 


To calculate the precision P; 


p (R x ,Ej) 


Mt+(Rx> Ej) 

m t+ (R x > E j) + M f+ ( R x , Ej ) 


( 6 ) 


To calculate the true positive rate or recall or sensitivity, R; 


R (Rx,Ej) 


M t+ {R x , Ej) + M f _{R x , Ej) 


( 7 ) 


To calculate the F-measure or score; 

2.P.R 

F 1 measure = — — — (8) 

1 P + R y J 


Figure 11 shows the confusion matrix of the entire classes extracted from Wincor EJ while Figure 12 shows 
classification analysis on randomly selected five entities such as ‘comment’, ‘avail balance’, ‘PAN’, ‘Transaction Type’ 
and ‘Amount’ using true positive (TP), false positive (FP), true negative (TN) and false negative (FN). 


Extracted -class 


T=5,7 -S2 


A 

E 

c 


E 

F 

Gr 

H 

I 

J 


A 

724 

0 

0 

0 

O 

0 

0 

0 

0 

0 


E 

0 

724 

0 

0 

0 

0 

0 

0 

0 

0 


C 

0 

0 

724 

0 

0 

0 

0 

0 

0 

0 


T> 

0 

0 

0 

724 

0 

0 

0 

0 

0 

0 

Actual 

E 

0 

0 

0 

0 

724 

0 

0 

0 

0 

0 

Class 

F 

0 

0 

0 

0 

O 

724 

0 

0 

0 

0 


G- 

0 

0 

0 

0 

0 

0 

306 

0 

0 

0 


H 

0 

0 

0 

0 

0 

0 

0 

314 

0 

B 


I 

0 

0 

0 

0 

0 

0 

0 

0 

724 

0 


J 

0 

0 

0 

0 

0 

0 

0 

0 

0 

74 


Figure 11: Wincor Confusion Matrix for 5,762 Data Extracted 

A = PAN, B = TSN, C = Terminal, D = Date, E = Time,F = Transaction Type, G = Amount, H = Avail Balance, 
I = Opcode, J = Comment 
(a) (b) 
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(C) 


(e) 


Confusion Matrix for Comment class 

74 TPs 

(actual comments that were 
cofire ctly classified as 
comments) 

3 FPs 

(others- that were 
incofirectly la. Veiled 
as- comments) 

0 FNs 

(comnwnta that were 
income tiy marked as 
other 

5650 TNs. 

(all the remaininz 
classes- correctly 
classified as- non- 
comments) 

(d) 

Confusion Matrix for FAN class 

724 TPs 

(actual PANs- that were 
cone ctly cla ssified as- 
ether cla ass) 

OFPs 

(other classes that 
were incorrectly 
labeled as P AN) 

0 FNs 

(PANs that were 
incorrectly marked as 
other classes) 

5033 TNs 
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Figure 12: Classification Analysis for 
Selected Classes on Wincor EJ 
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Figure 12: Performance Measure on 
Wincor EJ 

Note: Pr. = Precision, Re. = Recall, Acc. = Accuracy M is. = Misclassification, F = F-measure 
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CONCLUSIONS 

The study has demonstrated the use of the regular expression to extract financial information from ATM 
electronic journal. Electronic journal of Wincor ATM was examined as a semi -structured document, which was 
transformed into structured data. An EJ Parser algorithm was established, implemented, and tested. The EJ was broken 
down into subunits of transactions using divide and conquer concept, and each unit was recursively extracted using pattern 
extractor. The pattern extractor implemented a text mining process, considering a linguistic analysis of EJ using 
context-free grammar, and the information extraction module of the algorithm adopted regular expression as a classifier. 
There are entities of interest to the banks, which are identified as named entities for recognition task. 

The results were presented after the algorithm was tested with a collection of data, and its performance was as 
well evaluated using standard performance metrics; these are precision, accuracy, f-measure, and recall. The evaluation 
showed that the IE method has both true positive rate and extraction precision value above 90%. Conversely, the average 
speed of extraction is approximately 20s. There were few exceptions, which were closely observed on ‘comments’ and 
‘avail balance’ entities. The overall accuracy of the IE is 99.7% and precision is 99% averagely, but banks’ expectation is 
to achieve 100% accuracy and precision, because of financial implication involvement. 

REFERENCES 

1. Garofalakis, M. N., Rastogi, R., and Shim, K. (1999). SPIRIT: Sequential pattern mining with regular expression 
constraints. In VLDB 99, 7-10. 

2. Gomaa, H. (2011). Software modeling and design: UML, use cases, patterns, and software architectures. 
Cambridge University Press. 

3. Kawtrakul, A. (1995). A Computational Model for Writing Production Assistant System. The Proceeding 
NLPRS’95 Natural Language Processing Pacific Rim Symposium, 4-7. 

4. Kawtrakul, A., and Yingsaeree, C. (2005). A unified framework for automatic metadata extraction from electronic 
document. InProceedings of the International Advanced Digital Library Conference. Nagoya, Japan. 

5. Khalifa, S. S., and Saadan, K. (2013). The Formal Design Model of an Automatic Teller Machine 
(ATM), world , 3(4). 

6. Li, Y., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., and Jagadish, H. V. (2008). Regular expression 
learning for information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language 
Processing, 21-30. Association for Computational Linguistics. 

7. Tan, A. H. (1999). Text mining: The state of the art and the challenges. In Proceedings of the PAKDD Workshop 
on Knowledge Discovery from Advanced Databases 8, 65. 

8. Wang, Y., Zhang, Y., Sheu, P. C., Li, X., and Guo, H. (2010). The Formal Design Model of an Automatic Teller 
Machine (ATM). International Journal of Software Science and Computational Intelligence (IJSSCI), 2(1), 
102-131. 


NAAS Rating: 2.73- Articles can be sent to editor @impactjournals.us 



